11 research outputs found
Improving Network Performance Through Endpoint Diagnosis And Multipath Communications
Components of networks, and by extension the internet can fail. It is, therefore, important to find the points of failure and resolve existing issues as quickly as possible. Resolution, however, takes time and its important to maintain high quality of service (QoS) for existing clients while it is in progress. In this work, our goal is to provide clients with means of avoiding failures if/when possible to maintain high QoS while enabling them to assist in the diagnosis process to speed up the time to recovery.
Fixing failures relies on first detecting that there is one and then identifying where it occurred so as to be able to remedy it. We take a two-step approach in our solution. First, we identify the entity (Client, Server, Network) responsible for the failure. Next, if a failure is identified as network related additional algorithms are triggered to detect the device responsible.
To achieve the first step, we revisit the question: how much can you infer about a failure using TCP statistics collected at one of the endpoints in a connection? Using an agent that captures TCP statistics at one of the end points we devise a classification algorithm that identifies the root cause of failures. Using insights derived from this classification algorithm we identify dominant TCP metrics that indicate where/why problems occur. If/when a failure is identified as a network related problem, the second step is triggered, where the algorithm uses additional information that is collected from ``failed\u27\u27 connections to identify the device which resulted in the failure.
Failures are also disruptive to user\u27s performance. Resolution may take time. Therefore, it is important to be able to shield clients from their effects as much as possible.
One option for avoiding problems resulting from failures is to rely on multiple paths (they are unlikely to go bad at the same time). The use of multiple paths involves both selecting paths (routing) and using them effectively. The second part of this thesis explores the efficacy of multipath communication in such situations.
It is expected that multi-path communications have monetary implications for the ISP\u27s and content providers. Our solution, therefore, aims to minimize such costs to the content providers while significantly improving user performance
A Distributed Routing Protocol for Predictable Rates in Wireless Mesh Networks
Wireless mesh networks hold the promise of rapid and flexible deployments of communication facilities. This potential notwithstanding, the often erratic behavior of multihop wireless transmissions is limiting the range of applications that such networks can target. In this paper we investigate the feasibility and benefits of a routing protocol explicitly aimed at making wireless mesh networks more predictable while preserving their efficiency and flexibility. The protocol\u27s basic premise is the classical idea that a multipath solution can offer resiliency to unexpected link variations. The paper\u27s contributions are in demonstrating how this can be effectively realized in a wireless context, and in offering initial evidences of its efficacy. In particular, the paper illustrates how routing decisions that account for link variability can be computed in a distributed fashion, and the benefits they afford in improving the stability of end-to-end transmission rates even in the presence of random network fluctuations
Mitigating the Performance Impact of Network Failures in Public Clouds
Some faults in data center networks require hours to days to repair because
they may need reboots, re-imaging, or manual work by technicians. To reduce
traffic impact, cloud providers \textit{mitigate} the effect of faults, for
example, by steering traffic to alternate paths. The state-of-art in automatic
network mitigations uses simple safety checks and proxy metrics to determine
mitigations. SWARM, the approach described in this paper, can pick orders of
magnitude better mitigations by estimating end-to-end connection-level
performance (CLP) metrics. At its core is a scalable CLP estimator that quickly
ranks mitigations with high fidelity and, on failures observed at a large cloud
provider, outperforms the state-of-the-art by over 700 in some cases
Recommended from our members
PacketScope: Monitoring the Packet Lifecycle Inside a Switch
As modern switches become increasingly more powerful, flexible, and programmable, network operators have an ever greater need to monitor their behavior. Many existing systems provide the ability to observe and analyze traffic that arrives at switches, but do not provide visibility into the experience of packets within the switch. To fill this gap, we present PacketScope, a network telemetry system that lets us peek inside network switches to ask a suite of useful queries about how switches modify, drop, delay, and forward packets. PacketScope gives network operators an intuitive and powerful Spark-like dataflow language to express these queries. To minimize the overhead of PacketScope on switch metadata, our compiler uses a "tag little, compute early" strategy that tags packets with metadata as they move through the switch pipeline, and computes query results as early as possible to free up pipeline resources for later processing. PacketScope also combines information from the ingress and egress pipelines to answer aggregate queries about packets dropped due to a full queue